Very Large Vocabulary ASR for Spoken Russian with Syntactic and Morphemic Analysis
نویسندگان
چکیده
In this paper, we present a word-based very large vocabulary automatic speech recognition system for Russian. Some novel methods are proposed for organization of the lexicon and the language model. Two-level morpho-phonemic prefix graph that uses some information on morphemic structure of lexical units is suggested for a compact representation of the pronunciation vocabulary and search space. Such model is more compact than the lexical tree or the linearly-based vocabulary and provides speeding up the recognition process. The syntactic analysis of a training text corpus in a combination with the statistical analysis is suggested for generation of N-gram language models. The syntax-based Russian language model allows taking into account long-distance syntactic dependencies between word pairs. The results have proved that the syntacticstatistic language model gives 5% relative improvement on the word and letter error rates with respect to the baseline models.
منابع مشابه
Influence of Morphemic Analysis on Vocabulary Learning Among Palestinian 10th Graders
The aim of this study is to identify the influence of morphological analysis strategy employed by Palestinian 10th grade-female students in guessing and manipulating complex words in addition to using these words in meaningful sentences. This study involved 75 female students from Idna Secondary School for Girls at Hebron governorate. The sample of the study was assigned to control group (37 st...
متن کاملLarge vocabulary Russian speech recognition using syntactico-statistical language modeling
Speech is the most natural way of human communication and in order to achieve convenient and efficient human–computer interaction implementation of state-of-the-art spoken language technology is necessary. Research in this area has been traditionally focused on several main languages, such as English, French, Spanish, Chinese or Japanese, but some other languages, particularly Eastern European ...
متن کاملAutomatic scoring of non-native children's spoken language proficiency
In this study, we aim to automatically score the spoken responses from an international English assessment targeted to non-native English-speaking children aged 8 years and above. In contrast to most previous studies focusing on scoring of adult non-native English speech, we explored automated scoring of child language assessment. We developed automated scoring models based on a large set of fe...
متن کاملImproving Spoken Languag Using Word Confusion
A natural language spoken dialog system includes a large vocabulary automatic speech recognition (ASR) engine, whose output is used as the input of a spoken language understanding component. Two challenges in such a framework are that the ASR component is far from being perfect and the users can say the same thing in very different ways. So, it is very important to be tolerant to recognition er...
متن کاملThe Impact of Language Learning Activities on the Spoken Language Development of 5-6-Year-Old Children in Private Preschool Centers of Langroud
The Impact of Language Learning Activities on the Spoken Language Development of 5-6-Year-Old Children in Private Preschool Centers of Langroud N. Bagheri, M.A. E. Abbasi, Ph.D. M. GeramiPour, Ph.D. The present study was conducted to investigate the impact of language learning activities on development of spoken language in 5-6-year-old children at private preschool center...
متن کامل